Abstract
They introduce an extension to the bag-of-words model for learning words represen- tations that take into account both syntactic and semantic properties within language.
Introduction
For BOW
and CBOW
models, the problem is that they are lack of sensitivity of the word order which limits their ability of learn syntactically motivated embeddings.
In this work, they propose an extension to the continuous bag-of-words model, which adds an attention model that considers contextual words differently depending on the word type and its relative position to the predicted word (distance to the left/right).
For instance, in the sentence We won the game! Nicely played!
,
- the prediction of the word
played
, depend on both the syntactic relation fromnicely
, which narrows down the list of candidates to verbs, - and on the semantic relation from
game
, which narrows down the list of candidates to verbs related to games. - On the other hand, the words
we
andthe
add very little to this particular prediction. - On the other hand, the word the is important for predicting the word game, since it is generally followed by nouns.
Attention-Based Continuous Bag-of-words
CBOW with attention
It has a addition weight matrix $K \in \mathbb{R}^{|V| \times 2b}$.This is a set of parameters that determines the importance of each word type in each (relative) position.
Experiments
The model outperforms other methods in part-of-speech induction
. It performs similar with the structured skip-ngram model on part-of-speech tagging
. While training this model is faster.
They also use the Movie Review
dataset to evaluate the model. But their model do not perform as well as the CBOW
and Skipngram
. They said it is because their model learn more towards syntax.